-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: flexible web driver and proxy broker in fetch node #211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: flexible web driver and proxy broker in fetch node #211
Conversation
…functionality for proxy rotation the broker has been made fully configurable for anonymity level, admissible locations, scheme and max shape not to waste resources, unlike the original `free-proxy` package. other options have been explored (e.g., `proxybroker`, `proxybroker2`) due to their built-in proxy server and rotation capabilities, but the former is no longer maintained, and the latter has issue with any python version outside of python 3.9
…eb driver with proxy protection and flexible kwargs and backend the original class prevents passing kwargs down to the playwright backend, making some config unfeasible, including passing a proxy server to the web driver. the new class has backward compatibility with the original, but 1) allows any kwarg to be passed down to the web driver, 2) allows specifying the web driver backend (only playwright is supported for now) in case more (e.g., selenium) will be supported in the future and 3) automatically fetches a suitable proxy if one is not passed already
…nto fix/fetch-node-proxybroker
…rhs correctly to the node
examples/openai/proxy.py
Outdated
| proxies = search_proxy_servers( | ||
| anonymous=True, | ||
| countryset={"IT"}, | ||
| # secure=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess commenting out secure=True means none were being found.
that's what I'd document well: high security and low timeouts are inversely proportional
| A 'playwright' compliant proxy configuration. | ||
| """ | ||
| server = search_proxy_servers(max_shape=1, **proxy.get("criteria", {}))[0] | ||
| server = search_proxy_servers(**proxy.get("criteria", {}))[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only reason why I had forced max_shape=1 is cause that function will only be called by the fetch node; proxy-seeking users might as well call search_proxy_servers with the desired parameters.
_search_proxy is called only by parse_or_search_proxy which has been designed for the fetch node and specifically looks to generate one single proxy server satisfying criteria (fetch node won't use more than one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I had removed it since it was conflicting with the max_shape specified by the user if present. I have added it back now removing the user specified max_shape if present. will make a new merge
|
Thanks! Writing docs |
|
🎉 This PR is included in version 0.11.0-beta.5 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
|
🎉 This PR is included in version 0.11.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
before merging, I think we should make sure that:
I cannot run ollama until the end of the weekend, so someone else should take care of testing the fetch node functionality.